22 research outputs found
Deformable Part-based Fully Convolutional Network for Object Detection
Existing region-based object detectors are limited to regions with fixed box
geometry to represent objects, even if those are highly non-rectangular. In
this paper we introduce DP-FCN, a deep model for object detection which
explicitly adapts to shapes of objects with deformable parts. Without
additional annotations, it learns to focus on discriminative elements and to
align them, and simultaneously brings more invariance for classification and
geometric information to refine localization. DP-FCN is composed of three main
modules: a Fully Convolutional Network to efficiently maintain spatial
resolution, a deformable part-based RoI pooling layer to optimize positions of
parts and build invariance, and a deformation-aware localization module
explicitly exploiting displacements of parts to improve accuracy of bounding
box regression. We experimentally validate our model and show significant
gains. DP-FCN achieves state-of-the-art performances of 83.1% and 80.9% on
PASCAL VOC 2007 and 2012 with VOC data only.Comment: Accepted to BMVC 2017 (oral
Toward Reliable Human Pose Forecasting with Uncertainty
Recently, there has been an arms race of pose forecasting methods aimed at
solving the spatio-temporal task of predicting a sequence of future 3D poses of
a person given a sequence of past observed ones. However, the lack of unified
benchmarks and limited uncertainty analysis have hindered progress in the
field. To address this, we first develop an open-source library for human pose
forecasting, featuring multiple models, datasets, and standardized evaluation
metrics, with the aim of promoting research and moving toward a unified and
fair evaluation. Second, we devise two types of uncertainty in the problem to
increase performance and convey better trust: 1) we propose a method for
modeling aleatoric uncertainty by using uncertainty priors to inject knowledge
about the behavior of uncertainty. This focuses the capacity of the model in
the direction of more meaningful supervision while reducing the number of
learned parameters and improving stability; 2) we introduce a novel approach
for quantifying the epistemic uncertainty of any model through clustering and
measuring the entropy of its assignments. Our experiments demonstrate up to
improvements in accuracy and better performance in uncertainty
estimation
Comparative Analysis of Chromosome Counts Infers Three Paleopolyploidies in the Mollusca
The study of paleopolyploidies requires the comparison of multiple whole genome sequences. If the branches of a phylogeny on which a whole-genome duplication (WGD) occurred could be identified before genome sequencing, taxa could be selected that provided a better assessment of that genome duplication. Here, we describe a likelihood model in which the number of chromosomes in a genome evolves according to a Markov process with one rate of chromosome duplication and loss that is proportional to the number of chromosomes in the genome and another stochastic rate at which every chromosome in the genome could duplicate in a single event. We compare the maximum likelihoods of a model in which the genome duplication rate varies to one in which it is fixed at zero using the Akaike information criterion, to determine if a model with WGDs is a good fit for the data. Once it has been determined that the data does fit the WGD model, we infer the phylogenetic position of paleopolyploidies by calculating the posterior probability that a WGD occurred on each branch of the taxon tree. Here, we apply this model to a molluscan tree represented by 124 taxa and infer three putative WGD events. In the Gastropoda, we identify a single branch within the Hypsogastropoda and one of two branches at the base of the Stylommatophora. We also identify one or two branches near the base of the Cephalopoda
Vitamin D and Its Role During Pregnancy in Attaining Optimal Health of Mother and Fetus
Despite its discovery a hundred years ago, vitamin D has emerged as one of the most controversial nutrients and prohormones of the 21st century. Its role in calcium metabolism and bone health is undisputed but its role in immune function and long-term health is debated. There are clear indicators from in vitro and animal in vivo studies that point to vitamin D’s indisputable role in both innate and adaptive immunity; however, the translation of these findings to clinical practice, including the care of the pregnant woman, has not occurred. Until recently, there has been a paucity of data from randomized controlled trials to establish clear cut beneficial effects of vitamin D supplementation during pregnancy. An overview of vitamin metabolism, states of deficiency, and the results of recent clinical trials conducted in the U.S. are presented with an emphasis on what is known and what questions remain to be answered
Conception d’architectures profondes pour l’interprétation de données visuelles
Nowadays, images are ubiquitous through the use of smartphones and social media. It then becomes necessary to have automatic means of processing them, in order to analyze and interpret the large amount of available data. In this thesis, we are interested in object detection, i.e. the problem of identifying and localizing all objects present in an image. This can be seen as a first step toward a complete visual understanding of scenes. It is tackled with deep convolutional neural networks, under the Deep Learning paradigm. One drawback of this approach is the need for labeled data to learn from. Since precise annotations are time-consuming to produce, bigger datasets can be built with partial labels. We design global pooling functions to work with them and to recover latent information in two cases: learning spatially localized and part-based representations from image- and object-level supervisions respectively. We address the issue of efficiency in end-to-end learning of these representations by leveraging fully convolutional networks. Besides, exploiting additional annotations on available images can be an alternative to having more images, especially in the data-deficient regime. We formalize this problem as a specific kind of multi-task learning with a primary objective to focus on, and design a way to effectively learn from this auxiliary supervision under this framework.Aujourd’hui, les images sont omniprésentes à travers les smartphones et les réseaux sociaux. Il devient alors nécessaire d’avoir des moyens de traitement automatiques, afin d’analyser et d’interpréter les grandes quantités de données disponibles. Dans cette thèse, nous nous intéressons à la détection d’objets, i.e. au problème d’identification et de localisation de tous les objets présents dans une image. Cela peut être vu comme une première étape vers une interprétation complète des scènes. Nous l’abordons avec des réseaux de neurones profonds à convolutions, sous le paradigme de l’apprentissage profond. Un inconvénient de cette approche est le besoin de données annotées pour l’apprentissage. Puisque les annotations précises sont longues à produire, des jeux de données plus gros peuvent être construits à l’aide d’annotations partielles. Nous concevons des fonctions d’agrégation globale pour travailler avec celles-ci et retrouver l’information latente dans deux cas : l’apprentissage de représentations spatialement localisée et par parties, à partir de supervisions aux niveaux de l’image et des objets respectivement. Nous traitons la question de l’efficacité dans l’apprentissage de bout en bout de ces représentations en tirant parti de réseaux complètement convolutionnels. En outre, l’exploitation d’annotations supplémentaires sur les images disponibles peut être une alternative à l’obtention de plus d’images, particulièrement quand il y a peu d’images. Nous formalisons ce problème comme un type spécifique d’apprentissage multi-tâche avec un objectif primaire, et concevons une méthode pour apprendre de cette supervision auxiliaire
Detecting 32 Pedestrian Attributes for Autonomous Vehicles
Pedestrians are arguably one of the most safety-critical road users to consider for autonomous vehicles in urban areas. In this paper, we address the problem of jointly detecting pedestrians and recognizing 32 pedestrian attributes. These encompass visual appearance and behavior, and also include the forecasting of road crossing, which is a main safety concern. For this, we introduce a Multi-Task Learning (MTL) model relying on a composite field framework, which achieves both goals in an efficient way. Each field spatially locates pedestrian instances and aggregates attribute predictions over them. This formulation naturally leverages spatial context, making it well suited to low resolution scenarios such as autonomous driving. By increasing the number of attributes jointly learned, we highlight an issue related to the scales of gradients, which arises in MTL with numerous tasks. We solve it by normalizing the gradients coming from different objective functions when they join at the fork in the network architecture during the backward pass, referred to as fork-normalization. Experimental validation is performed on JAAD, a dataset providing numerous attributes for pedestrian analysis from autonomous vehicles, and shows competitive detection and attribute recognition results, as well as a more stable MTL training
End-to-End Learning of Latent Deformable Part-Based Representations for Object Detection
International audienc
WILDCAT: Weakly Supervised Learning of Deep ConvNets for Image Classification, Pointwise Localization and Segmentation
International audienceThis paper introduces WILDCAT, a deep learning method which jointly aims at aligning image regions for gaining spatial invariance and learning strongly localized features. Our model is trained using only global image labels and is devoted to three main visual recognition tasks: image classification, weakly supervised pointwise object lo-calization and semantic segmentation. WILDCAT extends state-of-the-art Convolutional Neural Networks at three major levels: the use of Fully Convolutional Networks for maintaining spatial resolution, the explicit design in the network of local features related to different class modalities, and a new way to pool these features to provide a global image prediction required for weakly supervised training. Extensive experiments show that our model significantly out-performs the state-of-the-art methods